Delft3D Performance Benchmarking Report
نویسندگان
چکیده
The Delft3D modelling suite has been ported to the PRACE Tier-0 and Tier-1 infrastructure. The portability of Delft3D was improved by removing platform-dependent options from the build system and replacing nonstandard constructs from the source. Three benchmarks were used to investigate the scaling of Delft3D: (1) a large, regular domain; (2) a realistic, irregular domain with a low fill-factor; (3) a regular domain with a sediment transport module. The first benchmark clearly shows a good scalability up to a thousand cores for a suitable problem. The other benchmarks show a reasonable scalability up to about 100 cores. For test case (2) the main bottleneck is the serialized I/O. It was attempted to implement a separate I/O server by using the last MPI process only for the I/O, but this work is not yet finished. The imbalance due to the irregular domain can be reduced somewhat by using a cyclic placement of MPI tasks. Test case (3) benefits from inlining of often-called routines. Introduction Delft3D [1] is a world leading 3D modelling suite used to investigate hydrodynamics, sediment transport and morphology and water quality for fluvial, estuarine and coastal environments. As of 1 January 2011, the Delft3D flow (FLOW), morphology (MOR) and waves (WAVE) modules are available as open source. Delft3D has over 350k lines of code and is developed by Deltares. The software is used and has proven its capabilities all over the world, e.g. in the Netherlands, USA, Hong Kong, Singapore, Australia, Venice, etc. It is continuously improved and developed with innovating advanced modelling techniques due to the research work of Deltares. Delft3D is meant to remain a world leading software package. Description The FLOW module is the heart of Delft3D; it is a multi-dimensional (2D or 3D) hydrodynamic (and transport) simulation programme which calculates non-steady flow and transport phenomena resulting from tidal and meteorological forcing on a curvilinear, boundary fitted grid or spherical coordinates. A more flexible grid approach is under development. In 3D simulations, the vertical grid is defined following the so-called sigma coordinate approach or Z-layer approach. The MOR module computes sediment transport (both suspended and bed total load) and morphological changes for an arbitrary number of cohesive and non-cohesive fractions. * Corresponding author. E-mail address: [email protected] b Deltares is an independent, institute for applied research in the field of water, subsurface and infrastructure. For more information, see http://www.deltares.nl/en 2 The application consists of mainly Fortran 90, with some routines in C and C++ and some features from Fortran 2003. The parallel version that we considered uses MPI with 1-D domain decomposition as its parallelisation strategy, where it automatically selects the longest dimension to be partitioned. The length of the domain (i.e. the direction with most grid-points) is split across MPI processes. It uses an alternating direction implicit (ADI) method to solve the momentum and continuity equations. The parallel implementation of the ADI method in Delft3D uses the halo regions of the computational domain on the processor to some extent as internal boundary conditions for iterations locally within the process. Therefore, convergence could become a problem when scaling up to higher process counts. I/O is implemented using a master-only technique. Although the application should scale well, as shown by similar models with the same input data set, this was not the case on local hardware at Deltares. The developers could not create an insightful profile that would show the main bottleneck that prevents its scalability. The MPI routines are wrapped in custom routines that are mostly used for halo exchanges and the reduction of convergence parameters. Halo exchanges are executed by two calls to MPI_Isend and MPI_Irecv, immediately followed by separate calls to MPI_Wait for all communication. The haloes are stored in temporary arrays. Configuration and setup The program uses automake, autoconf and libtool to create a configure script, which then configures the whole package for a particular system. For now, the only method to obtain the sources is to check out the svnversion. It is expected that a packaged version of the source of Delft3D would come with only one configure script that can target any platform. At the moment, there are still a few platform-specific options in the configure.ac file that prevent this. Porting Delft3D has been ported to and tested on the following systems: IBM Power6 system “Huygens” running Linux at SURFsara (IBM XL compilers + IBM POE) Intel Xeon Nehalem cluster “Lisa” running Linux at SURFsara (Intel compilers + OpenMPI) BullX Intel Xeon cluster “Curie” running Linux at CEA. (Intel compilers + BullxMPI) BullX Intel Xeon cluster “Cartesius” running Linux at SURFsara (Intel compilers + IntelMPI) During the porting to these systems, several portability issues were identified and fixed in the mainstream releases of Delft3D: The MPI-implementation in Delft3D would check the environment variable PMI_RANK, which is only used in the MPICH2-library and is therefore not portable. New releases of Delft3D have fixed this and it now supports MPICH2, Intel MPI, MVAPICH, OpenMPI and POE. In case of an abnormal exit of the code, the code would write an error message to the output-file, but the output-file was not closed. However, some compilers (e.g. from IBM) cache the output and the error message is not written to disk. This has been fixed. An erroneous attempt to de-allocate a static object caused a runtime error with the IBM compiler; this has been fixed. The MPI-implementation would not call MPI_Finalize in case there was only 1 MPI-task. In this case, Scalasca would not write its final report. This is now fixed. Variables with the LOGICAL and INTEGER type were used interchangeably, which is now fixed. Bug due to an assumed pointer size of 4 bytes, which is now fixed. When running the first benchmark with 3 MPI tasks or more, there would be a signed integer overflow when multiplying the total nr. of grid points (9M) with the running sum of the cpu weights (in this case 300). This can be fixed by using INTEGER*8 variables. OpenMP option in configure.ac updated. When opening a file, the non-standard argument access='append' was used, which is now replaced with the standard argument position='append'.
منابع مشابه
Benchmarking and the laboratory.
This article describes how benchmarking can be used to assess laboratory performance. Two benchmarking schemes are reviewed, the Clinical Benchmarking Company's Pathology Report and the College of American Pathologists' Q-Probes scheme. The Clinical Benchmarking Company's Pathology Report is undertaken by staff based in the clinical management unit, Keele University with appropriate input from ...
متن کاملMeasuring Performance, Estimating Most Productive Scale Size, and Benchmarking of Hospitals Using DEA Approach: A Case Study in Iran
Background and Objectives: The goal of current study is to evaluate the performance of hospitals and their departments. This manuscript aimed at estimation of the most productive scale size (MPSS), returns to scale (RTS), and benchmarking for inefficient hospitals and their departments. Methods: The radial and non-radial data envelopment analysis (DEA) ap...
متن کاملAssimilation of Remote Sensing into DELFT3D
Much effort has gone on in the hydrodynamic modeling community to build oceanic models of the littoral region. The spectral shoaling wave SWAN model, the phase-preserving shoaling wave Boussinesq models, and the DELFT3D flow model are all examples of such development. All of these require a description of the local environment as input in order to be able to run the models and generate either a...
متن کاملMiddleware benchmarking: approaches, results, experiences
The report summarizes the results of the Workshop on Middleware Benchmarking held during OOPSLA 2003. The goal of the workshop was to help advance the current practice of gathering performance characteristics of middleware implementations through benchmarking. The participants of the workshop have focused on identifying requirements and obstacles of middleware benchmarking and forming a positio...
متن کامل